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PART  I . INTRODUCTION 


The  availability  of  an  equipment,  or  system  of  components, 
such  as  an  electric  power  generator  or  boiler,  a nuclear  reactor, 
or  a reactor  safety  system,  is  defined  as  the  probability  that  the 
system  is  "up",  or  able  to  perform  its  intended  mission.  Since 
equipments  sometimes  fail,  system  availability  can  be  increased  by 
scheduling  inspections  and  allowing  for  preventive  maintenance, 
and,  when  needed,  corrective  repairs,  Also,  the  availability  of  a 
system  is  enhanced  by  the  introduction  of  redundancy,  i.e.,  by  the 
use  of  parallel  equipment. 

The  purpose  of  this  report  is  to  discuss  the  definition  and 
measurement  of  availability  from  a statistical  viewpoint.  The 
statistical  approach  to  problems  of  equipment  reliability  and 
availability  begins  by  representing  the  individual  component  times 
between  failures,  and  the  subsequent  down  or  repair  times,  by 
statistical  variables  having  probability  distributions.  Aspects  of 
this  mathematical  modeling  step  are  described  in  Part  II.  There  it 
is  pointed  out,  for  example,  that  long-run  availability  of  individually 
maintained  units  depends  only  upon  the  mean  or  average  time  to 
failure,  and  the  mean  repair  time  of  that,  or  similar,  equipments. 

Part  III  of  this  report  considers  the  problem  of  the  probable 
variability  of  availability  from  component  to  component,  and  its 
consequent  effect  upon  system  availability.  For  example,  the  mean 


w 


a.— lessor 


time  between  failures  (MTBF}  of  a component  of  a particular  type 
will  vary  because  of  manufacturing,  environmental,  and  maintenance 
differences.  There  will  be  differences  in  the  component  availability 
as  a consequence.  If  the  variability  of  the  MTBF,  and  also  the 
mean  time  to  repair  (MTTR)  of  a component  is  represented  by  probability 
distributions,  as  applied  in  the  Reactor  Safety  Report,  WASH-1400, 
then  statistical  variability  of  the  system  availability  is  also 
implied.  The  problem  considered  in  Part  III  is  that  of  approximating 
the  probability  distributions  of  the  availability  of  a system  of, 
perhaps,  many  different  components,  given  the  probability  distributions 
of  individual  component  MTBF  and  MTTR.  Having  such  a probability 
distribution  it  is  possible  to  place  a probability  level  on  a 
protective  system  of  components  (a  reactor  safety  system,  for 
:.r.&tar  ;e)  meeting  a required  availability. 

In  Part  IV,  it  is  shown  how  failure  and  repair  data,  amassed 
from  experience  with  individual  components,  can  be  utilized  to  make 
statistical  inferences  about  the  true,  but  unknown,  availability  of 
the  component.  It  is  also  shown  how  such  data,  available  for  each 
of  many  different  components  that  make  up  a system,  can  be  employed 
to  infer  system  availability.  The  method  used,  called  the  jackknife , 
tends  to  be  insensitive  to  the  mathematical  form  of  the  underlying 
probability  distribution  of  the  times  to  failure  and  times  to 
epair  observed.  This  property  is  useful  in  practice  since  the 
'.a- ter  distributions  are  unlikely  to  be  known  at  all  precisely.  An 
.xample  of  the  application  of  the  jackknife  technique  to  some 
actual  failure  and  repair  data  obtained  from  the  Humboldt  Bay  and 
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Yankee  is  presented  in  Part  IV,  Sec.  4.4*  The  confidence  limits 

£ 

for  the  long-run  availability  of  these  two  nuclear  plants  are 

calculated.  The  jackknife  confidence  limits  are  shown  to  resemble 

. . . : 1 

comparable  limits  obtained  by  two  other  methods,  but  actually  to  be 
slightly  narrower  than  the  latter. 

The  methodology  of  Parts  III  and  IV  are  aimed  at  solving 

similar,  but  not  identical  problems.  That  of  Part  III  addresses 

\ • r* ' r V * **. 

the  problem  of  assessing  the  availability  of  a system  of  components 
• • * 
before  any  data  on  the  particular  components  is  available.  This  is 

* *■.  • . * f.-i.  t * » \ 

done  on  the  basis  of  judgment  or  experience  with  similar  components 
in  and  from  different  environments.  The  procedures  of  Part  IV 
assess  the  availability  of  a particular  system  that  is  in  operation, 

* • - «.  v • 

and  whose  components  have  been  in  operation,  or  under  teat,  long 
enough  to  furnish  some  actual  failure  and  repair  data. 
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PART  II.  ANALYTICAL  MODELS  FOR  AVAILABILITY 


2. 1 General 

i i*  • ' T. 

In  this  section  several  mathematical  models  are  presented 
for  the  availability  of  a complex,  repairable,  and  possibly  redundant 
system.  Relevant  availability  models  are  reviewed,  and  methods  are 

suggested  for  obtaining  numerical  results  from  them,  once  having 

* • * 

specified  the  probabilistic  properties  of  components,  such  as  the 
probability  structure  of  the  failure  and  repair  processes.  Suggestions 

are  also  made  for  obtaining  time-dependent  availability  information 

/ 

from  data  on  component  failure  and  repair  times. 

2.2  Availability:  One  Element 

Consider  the  time  history  of  an  item  (e.g.,  a reactor 
safety  system,  an  entire  reactor,  or  a component  of  one  of  the 
above)  that  is  in  one  of  two  states  at  any  time:  available,  or 
unavailable.  For  short,  say  that  the  unit  is  "up"  when  available, 
and  otherwise  is  "down".  Suppose  that  the  up  time  intervals,  or 
times  to  failure,  (U^,  i*l,2,..),  are  a sequence  of  independent 
statistical  variables,  each  having  the  distribution  function  F(x); 
also  suppose  that  the  down  time  intervals  ( Di  i-1,2,..)  are  likewise 
independently  distributed  with  distribution  function  G(y)  . 
Furthermore,  if  both  {u^}  and  (D^)  are  statistically  independent, 
then  the  random  sequence  (or  stochastic  process)  X(t)  that  takes 
on  the  value  unity  when  the  system  is  up,  and  zero  when  down,  is 
called  an  alternating  renewal  process  (see  Cox  [3]).  Finally,  A(t) , 
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the  availability  of  the  system  at  time  t is  defined  to  be 

A (t ) * Probability  the  system  is  up  at  time  t 
= P{X(t)  = 1|X(0) } , 

where  X(0)  refers  to  its  condition  at  some  initial  time  point, 
denoted  by  t=0  . Explicit  mathematical  formulas  for  A(t)  will 
be  derived  and  discussed;  these  naturally  involve  properties  of  the 
up  time  and  down  time  distribution  functions,  F and  G , 

Note  Is  Availability  at  time  t depends  upon  initial  conditions: 
whether  the  system  is  u£  at  time  t=0,  perhaps  immediately  following 
repair,  or  down,  immediately  preceding  repair.  Thus,  it  is  proper 
to  define  availability  at  time  t,  given  the  item  state  at  t*0. 

For  instance,  the  probability  that  the  system  is  up  at  time  t, 
given  its  initial  state,  written 

A (t | X (0) ) = P{X ( t ) - 1 |X (0)  } 

is  of  interest:  X(0)  =1  signifies  that  the  item  is  up  at  t * 0. 

Under  reasonable  conditions  A(t|X(0))  will  tend  to  a constant, 

A(°°)  * A , as  time  increases.  The  latter  steady-state  availability 
is  independent  of  the  initial  conditions.  This  measure  of  system 
effectiveness  will  be  of  principal  concern  in  this  report. 
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Note  2:  Availability  as  described  here,  refers  to  the  probability 

i > . i ii  /i 

of  item  operability  at  one  point  in  time,  t.  It  may  also  be 
desirable  to  calculate  an  interval  availability 

A(t,A)  ® P(x(t')  ■ 1 for  all  time  t*  between  t and  t+M. 

For  instance,  A la  the  time  required  for  the  item  to  complete  its 
mission  (which  may  be  variable,  and  hence  be  modelled  as  a random 
variable) . 

Note  3:  It  may  well  be  that  there  is  interest  in  system  avail- 

ability at  demand,  and  that  demands,  e.g.,  nuclear  reactor  accidents 
or  earthquakes,  etc.,  occur  at  variable  times  and  can  be  treated  as 
a random  variable . For  instance,  let  T be  the  random  time  at 
which  a demand,  or  need,  for  the  safety  device  occurs,  therefore 
the  demand  availability  is  the  mean  value  of  the  quantity  A(T). 

It  is  sometimes  easier  to  calculate  this  latter,  more  seemingly 
complex  quantity  than  it  is  to  calculate  simple  point  availability. 

Note  4:  An  i4,em  is  in  only  one  of  two  states  in  the  present  setup: 

available,  or  unavailable.  We  make  no  use  of  a concept  of  reduced 
operability  at  this  stage,  although  such  may  indeed  occur. 
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2.2.0  A Mathematical  Model:  General  Independent  Up  and  Down  Times 


Assume  that  {U^}  are  mutually  independent  and  identically 
distributed  with  distribution  function  (d.f.)  F(x),  and  that 

{D^ } have  similar  properties  with  d.f.  G(y).  Assume  also  that 
the  up  and  down  times  are  mutually  independent  (a  model  without 
this  latter  assumption  has  been  suggested  and  discussed  by  Gaver  [2]). 


2.2.1  Derivation  of  A(t) 

Suppose  that  initially  the  system  is  just  beginning  an  up 
time,  and  the  availability  at  time  t is  to  be  calculated.  Denote 
by  C = + D^,  the  time  to  complete  exactly  one  failure-repair 

cycle.  The  time  C has  distribution  function 

F*G  « / F (z-y) dG (y)  = P{C<z) , (2-2-1) 

Jo 


where  * denotes  the  conventional  convulation  operation.  The 
system  is  up  at  time  t if  either,  (i)  it  is  up  at  time  t, 
never  having  failed,  an  event  with  probability  P[U^>t]  ■ l-F(t), 
or  (ii)  it  has  failed,  been  repaired,  before  t,  and  is  up  again 
at  t.  Expressed  mathematically,  this  says  that  (we  put  Ayft) 
for  availability,  given  it  is  up  initially) 


v*> 


l-F(t). 


•/ 


A.,  ( t-z ) d (F*G)  dz 
^ “cfz 


(2-2-2) 


j 


an  integral  equation  for  Ay(t),  given  that  the  system  was  up 
initially.  If  the  item  is  initially  down  the  equation  changes,  but 
AD(t)  is  easily  expressed  in  terms  of  Ay(t): 


AD(fc)  = 


/ Au 
J 0 


(t-z) dG (z) . 


(2-2-3) 


This  expression  simply  says  that  the  item  is  up  at  time  t if  it 
begins  a down  time  at  t = 0 which  lasts  until  time  z ; then, 
starting  in  an  up  condition;  as  in  (2-2-2) , it  is  up  at  time  t 
with  probability  Ay (t-z);  integrating  over  z gives  (2-2-3). 


2.2.2 


Solution  for  A (t) 

In  general,  a usable  closed-form  solution  to  the  integral 
equations  (2-2-2  and  3)  is  not  available.  One  exception  is  notable, 
namely  that  in  which  up  and  down  times  are  exponentially  distributed. 
That  is 


l-e  -Xx, 


F(x) 

G(y)  = 1-e'^ 


(2-2-4) 


Equations  (2-2-2)  and  (2-2-3)  yield  the  fomulas 


A„(t)  = e-<x+l,)ttxiTr[l-e-(it|j)tl, 


(2-2-5) 


and 


AD(t)  " xnT  [l-e"(X+Vj)t] 


(2-2-6) 


. i 
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Note  2:  The  steady-state  availability  is  of  the  form 


lim  Aytt)  lim  AD(t) 

t-*-oo  t-»oo  ^ 


(2-2-7) 


Thus,  in  the  long  run,  the  system  availability  is  the  average 
length  of  an  up  period  divided  by  the  average  "cycle  length",  where 
"cycle"  is  defined  to  be  an  up  period  plus  the  following  down 
period.  The  validity  of  equation  (2-2-7)  does  not  depend  upon  the 
properties  of  the  distribution  of  U and  D. 

To  find  the  general  solution  to  equations  (2-2-2)  and 
(2-2-3)  the  Laplace  transform  technique  may  be  used.  If  one  takes 
Laplace  tranforms  in  (2-2-2)  , the  transform  of  the  availability  is 
found  to  be 


A = I*  1 - f (s)  , 

AU  s 1-f (s) g (s) 


(2-2-8) 


see  Reference  [2]. 


Note  1:  If  initially  the  item  is  up,  then  there  is  a decrease  of 

availability  until  a steady-state  value  y— jj  is  reached.  Likewise, 
if  the  item  is  initially  down , then  the  availability  increases  to 
. in  both  cases,  the  steady-state  values  are  the  same,  and  the 
approach  is  governed  by  the  "time  constant"  X+y. 


wncre 


Au(s)  = 4°°  e StAu(t)dt 


(2-2-9) 


f(s)  = f°°  e-SXdF(x) , 

0 


g(s)  = r°°e'sydG(y) 
J0 


In  principle,  the  transform  (2-2-8)  provides  the  time-dependent 
solution  desired.  The  inversion  of  the  transformed  equation  (2-2-8) 
is  sometimes  difficult.  Several  "practical"  remarks  are  in  order. 


Note  Is  If  f(s)  and  g(s)  are  both  rational  functions  of  s, 
e.g.,  if  g(s)  and  f(s)  are  Erlang: 


dF  -kXx 


(kXx) 


k-1 


dx 


= e 


-ii!  kx , 


(2-2-10) 


(K-i) : 


f (S)  = 


E 


or  k a positive  integer,  and 


j-1 


dG  = -juy  ( jwyii  . 

dy  6 (3 -ITT 


(2-2-11) 


- (-rtr)  1 


p + S 

3 


N 

I ft 

I , 


1 
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again  for  j a positive  integer,  then  explicit,  but  messy,  mathematical 
inversion  can  be  accomplished.  Numerical  results  can  be  obtained 
by  writing  a FORTRAN  program  and  even,  very  possibly,  by  use  of  a 
programable  hand-held  calculator.  Since  almost  any  distribution 
function  can  be  well-represented  by  a distribution  having  rational 
Laplace  transform,  the  above  procedure  can  be  carried  out  in  practice. 


Note  2 : Computer  programs  have  been  developed  for  numerically 

inverting  Laplace  transforms,  c.f.  Gaver  [1],  and  application  of 

one  of  these  is  also  practically  possible.  One  must  have  the 

Laplace  transforms  of  the  component  distribution  functions  of  F 

and  G in  order  to  achieve  the  final  result.  In  practice,  again, 

one  may  well  have  observations  from  the  latter:  u, ,u,,...,u  , and 

i i n 

d^,d2»...»dm  (n  = m,  or  n y*  m for  the  sample  sizes  need  not  be 
the  same).  Now  one  can: 


a)  fit  a plausible  analytic  form,  e.g.,  a member  of  the  gamma 
family,  to  F and  G: 


f(s) 


g(s) 


.i 


k 


(2-2-12) 


11 


fits  can  be  determined  by  maximum  likelihood,  or  by  the  moment 
matching  method,  i.e.,  by  equating  the  theoretical  distribution’s 
mean,  variance,  etc.,  to  the  corresponding  mean  and  variance 
of  the  sample  data,  later  solving  for  the  distribution's 
parameters . 


The  actual  operating  characteristics  of  the  above  approaches — 
and  variations  thereof — remain  to  be  evaluated.  Very  likely  an 
••experimental  sampling  or  Monte  Carlo  approach  will  be  required  to 
shed  light  on  their  performance. 
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2.3.1  Steady  State  System  Availabilit' 


System  availability  depends  upon  the  availability  of  its 
subsystems  and  upon  the  operational  logic.  Suppose  a system  is 
composed  of  N elements  where  it  is  assumed  that  the  up  and  down 
times  (i.e.,  time-to-f ailure  and  repair  time)  of  each  element 
are  statistically  independent,  then  the  system  availability  can  be 
calculated  from  the  element  availability.  Because  of  the  independence 
assumption,  this  particular  model  may  not  be  applicable  to  the 
common  failure  mode  situation  or  to  the  situation  of  repairing  of 
elements  involving  a waiting-queue  (insufficient  repairmen) . 

Let  A^  denote  the  steady-state  availability  of  i—  element,  then 
as  in  Eq.  (2-2-7) , 


Ai  " 


E[Ui] 


(2-3-1) 


and  the  unavailability  of  the  i—  element,  /L  , is  given  by 


A.  = 1-A.. 


The  availability  of  several  types  of  systems  is  derived  below: 


System  Type  1.  N Unit  Redundant 

If  N elements  are  arranged  in  parallel,  i.e.,  redundantly, 
so  that  the  system  operates  if,  and  only  if,  at  least  one  operates, 
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then  system  unavailability  is  on  the  basis  of  element  independence, 


A “ Ai  ‘ A2* ‘ ,AN  " J*i  Ai 


(2-3-2) 


or,  equivalently,  availability  is 


A = 1 - (1-A^  (1-A2)  . . . (1-Ajj)  = 1 


(2-3-3) 


System  Type  2.  M out  of  N Unit  Redundant 

If  N items  are  arranged  in  a system  so  that  if  at  least 
M operates  (1  <_  M N)  , the  system  operates,  then  system  availability 
can  be  computed  (again  using  the  independence  assumption)  as  follows: 
(a)  Compute  the  probability  that  each  set  of  exactly  M units 

operates  (the  remaining  set  of  N - M does  not  operate) . 

(N  \ N 1 

_ = ■ v . such  sets.  Add  these  individual 

m ' ml (N-m) ! 

probabilities. 

(b)  Add  the  probabilities  of  (a)  for  m = M,M+1,...N.  This 
is  the  required  result. 


As  an  illustration,  consider  the  two  out  of  three  system; 
here  M = 2,  N = 3.  The  results  of  steps  (a)  and  (b)  are  as 
follows: 


(a)  m = 2 : A2  A^  + A^  A2  A^  + A^  A2  A^ 

m — 3 : A 2 A^ 

(b)  A = system  availability 

= A,  A^  + A^  A2  A3  + A^  A2  A^  + A^  A2  A^ 


(2-3-4) 
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A recursive  scheme  to  calculate  system  availability  is 


now  described. 


Procedure 


(1)  Enumerate  the  elements,  the  i—  being  called  Element  i, 


i 1 , 2 , . . . , N . 


(2)  Define 


(a)  a(j,k)  = Probability  that  exactly  j out  of  the  first 
k elements  are  up  (0  <_  j < k) . 


(b)  A (M,N)  = Availability  of  a M out  of  N system 


* 2 a ( j ,N) 
j=M 


(2-3-5) 


(c)  Compute  a(j,k)  for  j < k < N 


a(j,k)  = a ( j , k-1)  A.  + a(j-l,  k-l)A. 


(2-3-6) 


to  obtain  a(j,k),  M < j < N; 


and  where 


A (0 , 1 ) = a (0, 1)  - A. 


(2-3-7) 


A (1 , 1 ) « a (1 , 1)  = A. 


i 


(d)  Compute 


A (M/N)  = 2 a(3'N) 

j-M 


This  is  the  required  availability. 

In  order  to  explain  the  recursive  formula  (2-3-6)  notice 
that  j out  of  the  first  k elements  are  available  if  either  j 
out  of  the  first  k-1  are  available  and  the  k—  is  unavailable, 
or  if  j-1  out  of  the  first  k-1  are  available,  and  the  k—  is 

av  ilable. 

A return  to  the  previous  example  illustrates  the  technique. 

First, 


a (1 , 2)  = a (1 , 1)  A2  + a (0 , 1)  A2 


A^  A2  A2 


(2-3-9) 


Next 


, using  (2-3—8)  and  also  (2-3—9) , 


a (2, 3)  = a (2 , 2)  A3  + a(l,2)  A3 

c A^  A2  A3  + (A^  ^2  ^ ^1  ^2^  ^3 


(2-3-10) 


Since,  a (3 , 3)  = Ax  A2  A3  according  to  (2-3-8),  this  added  to  (2-3-10) 
elivers  the  required  result,  by  (2-3-5) . 
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System  Type  3.  M out  of  N Unit  Redundant,  Identically 
Available  Units. 

This  is  the  same  system  logic  as  immediately  above.  But 
since  the  units  are  believed  to  have  equal  availabilities,  the 
binomial  distribution  can  be  used  to  calculate  system  availability 
from  component  availability: 


(2-3-11) 


* - ,L " 

1 — W 


Here  Aq  denotes  the  ith  individual  system  availability.  The 
Equation  (2-3-11)  has  been  extensively  tabled,  and  so  is  convenient 
to  use,  if  appropriate. 


2.3.2  More  Complex  Models;  Transients . Dependence 


In  order  to  deal  with  more  complex  models  of  system 
availability  it  is  useful  to  use  Markov  process  models;  (see  Gaver 
and  Thompson  [4]  or  Karlin  [5]  for  an  introduction).  Only  a brief 
discussion  will  be  given  here,  and  that  in  terms  of  examples. 


Example  1.  Single  Unit 

Consider  a single  system  element  or  unit,  with  failure 
rate  at  time  t being  X(t),  and  repair  rate  y(t).  The  time 
dependence  of  these  rates  may  be  used  to  represent  reliability 
growth:  A (t)  may  well  decrease  with  time  because  initial  difficulties 

are  found  and  removed,  and  p (t)  may  increase  because  of  greater 
familiarity  with  the  system  on  the  part  of  those  responsible  for 
its  "cintenance. 


Let  PQ(t)  be  the  probability  that  the  unit  is  up  at 
time  t,  and  P^t)  = l-PQ(t)  be  the  probability  that  it  is  down 
for  repair.  Then  the  probability  that  the  unit  is  up  at  time  t + 
h can  be  written  as  follows: 

PQ(t+h)  = PQ(t) [1-X (t) h]  + Px(t)y(t)h  + R(t,h)  (2-3-12) 

In  other  words,  Equation  (2-3-12)  states  that  the  unit  is  up  at 
t+h  ih  0)  if  (i)  it  is  up  at  t (probability  PQ(t))  and  does  not 
fail  during  the  time  from  t to  t+h  with  probability  approximately 
1-X(t)h  , or  (ii)  it  is  down  at  t with  probability  P^tt)  and  is 
repaired  between  t and  t+h  (probability  y(t)h).  Other  possibilities 
have  the  probability  R(t,h),  which  according  to  the  Markov  assumption 
is  small  compared  to  h (literally,  the  limit  of  R(t,h)/h  as  h 
tends  to  zero  is  zero) . Note  that  neither  the  time  since  last 
failure,  nor  the  time  that  repair  has  been  going  on,  influences  the 
probability  of  state  change.  This  is  the  "Markov  property". 

Now  subtract  PQ(t)  from  both  sides  of  Equation  (2-3-12), 
and  divide  by  h;  let  h tend  to  zero.  We  have  then  the  following 
differential  equation, 

X(t)PQ(t)  + p(t)  Px  (t)  (2-3-13) 

[X(t)  + u ( t )]  PQ(t)  + w ( t ) 

The  solution  may  be  expressed  as 

P (t)  = P (o)  e-r(t)  + / e~r(t~z)  w(z)  dz 

0 0 ■'o 
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where  r(t)  = / tx(x)  + w(x)]dx  , and  Pfl(0)  is  the  probability 
0 

that  the  system  is  up  at  time  t=0  . If  X (t)  = X#  y(t)  * y are 


constants,  then 


PQ(t)  = PQ(o)  e (x  + lJ>t  + JL-  [l-e'U  + u)t]  (2-3-15) 


so  that  if  the  system  is  up  initially  Pn(0)  * 1 • 


P (t)  = — e*(X+p)t  + -SL- 
X + u X + u 


while  if  it  is  down  for  repair  initially  PQ(0)  ■ 0 , 


(2-3-16) 


P0(t) 


■*  Ku+y)t] 


(2-3-17) 


It  may  be  observed  that  the  expressions  (2-3-16)  and  (2-3-17) 
describe  the  effect  of  initial  conditions  on  availability  at  time  t, 
as  described  in  Section  2.2.2.  As  time  t-*-00  in  either  expression, 
PQ(t)  — which  is  equal  to  A(t),  the  probability  that  the  unit 
is  available  — approaches  A,  the  steady-state  expression  (2-2-3) , 
by  virtue  of  the  fact  that  E[U]  = X~*,  and  E(D]  * y“*. 


i 


i 


Example  2.  Three  Units 

If  there  are  several  units,  then  the  system  state  must 
describe  which  are  up.  For  instance, 


PQ  0 Q(t)  = Probability  Units  1,  2,  3 are  up  at  t. 
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P1  0 = Probability  Unit  1 is  Down,  Units  2 and 

3 are  Up  at  t. 

P1  11^  = ProbabiHty  Units  1,  2,  3 are  Down  at  t. 

There  are,  in  all,  eight  states:  (0,0,0),  (1,0,0),  (0,1,0), 

(0,0,1),  (1,1,0),  1,0,1),  (0,1,1),  (1,1,1),  and  their  associated 
probabilities,  for  which  differential  equations  may  be  written. 

Thus  by  the  same  argument  as  utilized  to  derive  equations  (2-3-13) , 
the  system  of  equations,  (the  parameters  can  be  time-dependent) , 
are  given  below: 

~ P000(t^  = ” (X1+X2+X3*P000  ^ + plP100^t^  + W2P010^ 

+ y 3P001 (t)  (2-3-18) 

^ P100(t)  " -^i+^2+X3)  P100(t)  + X1  p000  (t)  + y2  P110 

! + W 3 P101 (t)  (2-3-19) 

--f  Pmtt)  * - (y1+u2+y3)  pm(t)  + xi  pon(t)  + x2  pioi(t) 

+ X 3 P110(t)  (2-3-20) 

For  the  setup  above  it  turns  out  that,  since  all  units  are 
ir dependent,  the  solution  can  be  expressed  as  products  of  solution 
•*  single-unit  problems,  i.e.,  using  equations  (2-3-15)  to  (2-3-18) 
s appropriate. 
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The  differential  equation  approach  can  be  used  to  model 
systems  in  which  component  availabilities  are  not  independent, 
perhaps  because  of  limited  repair  capability.  Suppose,  for  instance, 
that  there  is  only  one  repairman,  and  that  he  assigns  priority  to 
units  1,  2,  3 in  that  order  if  the  elements  are  down.  In  other 
words,  if  the  repairman  is  repairing  Unit  2,  and  if  Unit  1 fails, 
he  immediately  changes  to  Unit  1.  In  this  case,  equation  (2-3-18) 
remains  the  same  but  equation  (2-3-19)  becomes 

^ P100(t)  = ~ ()j1+x2+X3)  P100{t)  + X1  P000(t)'  (2-3-21) 

and 

P110(t)  = ~(VX3J  P110(t)  + X1  P010(t)  + X2  P100(t) 


* 


i"^.1  m.  ''  i m "■!'  ' wii 


The  long-run  or  steady-state  probabilities  are  derived  by  equating 
the  derivatives  to  zero,  and  solving  the  resulting  system  of 
linear  equations,  subject  to  the  condition  that  the  sum  of  the 
probabilities  equals  one.  It  is  recommended  that  a computer 
routine  be  used  for  this,  as  the  explicit  solution  is  very  messy. 

The  time-dependent  or  transient  solution  may  also  be  obtained  by 
numerically  integrating  the  differential  equations;  a Runge-Kutta 
method  will  work  well. 

Finally,  the  availability  can  be  calculated  in  an  obvious 
way  from  the  probabilities  as  obtained.  For  instance  if  the  system 
logic  requires  that  at  least  one  be  operative,  then 

A (t)  = 1-  Pni(t)  (2-3-24) 

'v  l , 4 

while,  if  two  out  of  three  operative  is  required,  then 

A(t|  ■ p000<t)  * P100(t)  + P010(t)  + p001(t)  C2-3-25) 

more  complex  setups,  including  common  mode  failures,  may  be  treated 
similarly. 
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PART  III.  APPROXIMATE  CONFIDENCE 
LIMITS  FOR  SYSTEM  AVAILABILITY 


3.1.  General 

The  steady  state  system  availability  of  a complex 

system  depends  upon  the  availabilities  of  its  components  and  the 

system  operational  logic.  Denote  the  system  availability,  A , 

s 

A = i|)(A.  ,A-,..., A ) , 


(3-1-1) 


where  A^  is  the  steady  state  availability  of  the  i—  component 
and  the  function  $ is  a system  logic  function,  which  describes 
system  availability  in  terms  of  component  availability. 

Furthermore,  under  broad  circumstances,  and  as  a first 
approximation , 


Ai  “ 


E[Ui] 


(3-1-2) 


where  E[LL]  represents  the  expected  up  time  or  time  to  failure 
and  E [D^]  is  the  expected  down  time  of  the  i—  subsystem. 

Now  judgments  about,  and  experience  with,  the  component  avail- 
abilities, A^ , will  differ,  and  so  it  may  be  natural  and  useful 
to  represent  this  variability  by  probability  distributions  (somewhat 
in  the  spirit  of  Bayesian  statistics,  see  DeGroot  [2]).  In  fact, 
the  Reactor  Safety  Report,  WASH-1400  has  adopted  this  notion; 
specifically,  it  assumes  the  logarithmic-normal  distribution  to 
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describe  the  variability  of  E[U^),  and  E [D^] , or  equivalently 


= (E[Ui])_1  and  A^  = (E [D^] ) , respectively.  That  is,  the 


availabilities  of  similar  components  of  type  i vary  randomly; 

therefore , 


Ai  " X 


y 

% i 


+ y 

'Xj  i 'Xi  i 


(3-1-3) 


is  a statistical  variable,  where  Jin  A.  is  Normal  (m,  ,o,  ) 

^ dL  ^ 1. 

and  in  u is  Normal  (nv  ,o*  ) . Consequently  the  availability 
^i  ui  yi 

of  a system  constructed  of  such  elements  is  also  a random  variable. 


The  problem  is  to  assign  a probability  number  to  the  event  that 
the  availability  of  a system  exceeds  a given  lower  bound,  given 
the  distributions  of  component  failure  and  repair  rates.  Equivalently, 


one  can  specify  a lower  bound,  as,  such  that  system  availability  Ag, 


exceeds  it  with  a specified  probability. 

Under  the  assumptions  made,  the  problem  cannot  be 
solved  in  a neat,  closed-form,  manner.  This  part  of  the  report 
proposes  an  approximation  method  which  provides  a satisfactory 
approximation  (as  indicated  by  a Monte  Carlo  simulation  study) . 
However,  further  investigations  are  recommended.  The  method  is 
known  as  Linearizing  System  Availability  Log-Odds  (abbreviated 
LALOD) . 


25 


3.2.1  Single  Component  System. 

Consider  first  a system  consisting  of  a single  component, 
and  express  its  availability  in  the  following  equivalent  forms: 


The  parameters 

1 and  U . Let 
<\,  \ 

L = inX  - fcny  (3-2-2) 

S 'V,  i\j 

In  the  WASH-1400  case  where  X and  y are  log-normally  distributed 

*\j  f\j 

L would  be  a normally  distributed  ramdom  variable  with  mean 

5 

m = mx  - my  and  variance  o2  = + o2.  Furthermore,  the  LALOD 

variable,  L , can  be  expressed  as  a function  of  system  availability, 
s 


thus,  the  distribution  function  of  A is  qiven  by 


1 + e 


X and  y are  realizations  of  random  variables 

L be  the  LALOD  variable  and 
s 


P.[a3  > »3j  » P 


|t  i [«n  (d-islij”1)-  »j|  (3-2-4: 


1 b 

— C s -iu 
- V27  J e 2 du 


where  e has  a standard  normal  distribution  (with  mean  zero  and 


variance  unity)  and 


bg  = 0 


1 [ln((l-as)  a^-  mj 


To  construct  the  one-sided  probability  limit  of  Aa  for  a given 

«\,S 

level  of  significance/  a , equation  (3-2-5)  can  be  used  to  deter- 
mine an  ot  value  since  b.  can  be  found  from  the  standard 
— s s 

normal  probability  table  for  that  given  value  of  a. 


Note  1:  The  distribution  function  and  the  probability  limit 

derived  for  the  system  availability  A_  is  exact  under  the 
assumption  that  £n(X/vi)  is  a normal  random  variable. 


NOTE  2:  The  assumption  that  ln(X/P)  is  Normal  is  not  the  only 

possibility:  under  some  circumstances  another  transformation  may  be 

more  suitable.  In  fact,  a transformation  to  another  basic  distribution, 
other  than  the  Normal,  may  be  indicated  by  data.  In  any  case, 
the  odds  transformation  is  still  helpful  numerically.  This 
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particular  transformation  has  been  systematically  explored  by  Cox 
in  a data-analytical  context,  see  Cox  [1].  The  same  arguments 
that  make  it  appealing  in  that  context  tend  to  recommend  it  for 
the  present  purposes. 

Note  3 : The  log-odds  transformation  in  equation  (3-2-3)  has 

range  -®<L  «*>,  corresponding  to  the  domain  0 < Ao  < 1: 
s s 

A = 0 corresponds  to  L_  = <»,  and  A_  » 1 corresponds  to 

S SB 

Ls  = -°°.  It  is  immaterial  whether  be  defined  as  shown,  or 

as  the  log  of  the  inverted  ratio.  In  any  case,  L ranges  over 

S 

the  natural  region  of  definition  of  the  normal  distribution,  and 

will  be  more  nearly  normally  distributed  than  will  A itself. 

s 

3.2.2.  Multiple  Unit  System 

Now  consider  a system  consisting  of  several  units 
arranged  in  a redundant  manner.  The  general  procedure  of  LALOD 
transformation  is  outlined  below: 

LALOD  Procedure 

(1)  Form  the  system  availability  in  terms  of  component  availa- 
bilities : 

Ag  = <J>  (A^  , . . . , A^ ) . 


(3)  Compute  the  center  of  the  log-odds  distribution: 


ag  - $ (a^ , a2 , . • • am) 


(3-2-7) 


where 


ai  = 


m. 


(3-2-8) 


1 + e 


mi  = mx.  " %.  " E[£n  ^i1  ' El*n  JSii1* 


(4)  Compute  the  linearized  approximation  to  the  variance  of  log 
odds  availability  by  use  of  the  formula 


- j—± rj  f (^-)2  a 2 (l-a.)2ol 

tas(1"as]  i=1  V 8ai  / 1 11 


(3-2-9) 


(5)  Express  the  system  log  odds  availability  as 


L =>  in 
s 


(3-2-10) 


where  e is  Normal  (0,1).  Thus,  by  using  equation  (3-2-4),  the 
following  approximation  is  obtained  for  the  probability  that  the 
availability  fo  a system  exceeds  a lower  bound  a : 
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4 


L 


P u - u i - P 
i -s  ~si 


i£  * (ln  hr3]' ,n  hr])0*1! 


_L_  f3  e-^2  du 
\Zjii  -oo 


(3-2-11) 


b„  = a * in 
s s 


[£y  (-&-)] 


From  equation  (3-2-11)  one  can  easily  determine  desired  probability 


limits.  To  determine  a „ such  that 

-s,p 


P<AS  2.  is, p>  ■ Pl 


(3-2-12) 


simply  compute 


ag  + (l-as)  e°seP 


(3-2-13) 


where  ep  is  the  p—  quantile  of  the  unit  normal; 


p = _L_  / e_liU  du, 


(3-2-14) 


v^r  ie 


available  from  tables  of  the  Normal  distribution. 


Note  1:  The  a of  Equation  (3-2-7)  is  precisely  the  mean  or  expected 

s 


va.l ;e  of  the  log-odds  availability  for  a single  unit.  The  transformation 


tends  to  symmetrize  A„ f A approximates  the  mean  or  center  of  the 

5 9 


L distribution  when  a system  involves  more  than  one  unit. 
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Note  2:  Derivation  of  Equation  (3-2-9)  can  be  accomplished  by 

first  writing  the  differential 


dL 


s 


1 

* (1-4)) 


m 

2 H dki 
i=l  3At 


1-A. 

and  then  differentiating  in  — — = Z.  to  express  the  local 

Ai  1 

variation  of  Lg  near  its  center  as 


dL 


asd-as) 


m 

v 

.i=l 


3<fr 

8ai 


(l-a^ 


°iei 


(3-2-15) 


(3-2-16) 


Squaring  and  taking  expectations  results  in  the  variance  equation  (3-2-9) . 
The  same  basic  procedure  can  be  extended  to  handle  correlations  between 
units. 

To  demonstrate  the  application  of  LALOD  approximation, 
an  example  is  given  below: 

Example:  Two-Component  Redundant  System 

Consider  a system  which  consists  of  two  parallel  redundant 
units;  the  operation  logic  is  assumed  to  be  one-out-of-two.  Thus, 
the  system  unavailability  or  availability  is  given  by 


or 


A1A2 


As  - 1 - (1-AX)  (1-A2)  = MAj^Aj). 


(3-2-17) 

(3-2-18) 
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Thus  step  (1)  gives 


Ls  = an 


V2 

i-axa2 


(3-2-19) 


Step  (2)  yields 


ss  = 1 - [l  + e'ml]  _1  [l  + e 


from  which  the  center  of  L is  found  to  be 

s 


+ e“m2 ] -1 


E [Ls]  * an 


(l+e~ml)_1  (l+e~m2)_1 
1 - (l+e~ml) ~1  (l+e~m2) “*• 


= + m2  - an[l  + e+ml  + em2]. 

If  X.  <<  \i.  , as  is  likely,  then  m.  is  negative  and  in 

magnitude  around  -3  to  -6.  Hence  the  center  of  the  Lg 
distribution  is  likely  to  be  near  + m2  . 


Next,  step  (3)  approximates  Var[L  ] by 

s 


[a’o*  + a’o*] , 


(l-a1a2) 


(3-2-20) 


(3-2-21) 


which  can  be  expressed  in  terms  of  m, . Once  again  if  X.  <<  y . 

then,  for  even  moderately  negative  m^ , o*  % °i  + a2 
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In  passing,  note  that  the  general  n-component  sytem  is 
equally  easy  to  approximate  in  the  manner  described.  For  this 


\!i 


E [Ls]  * in 


f£] 


3.3.  Some  Simulation  Validations 

Monte  Carlo  simulation  is  used  to  validate  the  adequacy 
of  the  proposed  LALDO  approximation.  To  do  so,  realizations  of 
component  availabilities  are  obtained  as  follows. 


1 + e 


mi+Ei  °i 


where  nu  and  are  given,  and  where  represents  a random 

normal  number  with  mean  zero  and  standard  deviation  unity.  The 

system  availability  is  then  calculated  according  to  the  system 

logic  function  $ at  the  values  of  A^.  Identify  each  realization 

A.  so  obtained  and  use  equation  (3-2-13)  to  obtain  a . Finally, 
1 — s,p 

corpare  the  fraction  of  say,  n = 1000  repetitions  that  fall 


33 


above  u with  the  approximated  probability  p . If  the  fraction 

— s , p i 

agrees  with  p to  within  sampling  error,  the  approximation  method 


is,  therefore,  desirable. 

Several  such  sampling  validations  are  performed.  The 


results  as  shown  in  Table  3.1  are  in  good  agreement. 

A detailed  explanation  of  the  simulation  runs  follows.  Recall 
that  if  the  statistical  variable  X has  the  log  normal  distribution, 
i.e.,  in  X ^ N(m,o2),  then 

E [X]  - em+02/2  (3-3-2) 

Var  [Xl  = e2m+o2  [(,a! 

and  the  coefficient  of  variation 

C(X)  = Var  [X]  * (E[x])2  = e°2  -1 

For  the  first  case  in  the  table  (3-1),  a choice  of  m^  = Jln(10_^/2) 
and  o2  = Hn4  for  the  population  from  which  Component  I was 
selected  (the  mean  failure  rate  from  that  population  is  10~3  (days) , 
with  coefficient  of  variation  of  3) . Component  II  was  selected 
at  random  from  a population  having  mean  failure  rate  0.5  x 10-^. 

In  all  cases,  the  repair  time  was  assumed  to  be  exactly  one  day 
in  duration,  merely  to  simplify  the  sampling  experiment.  Next, 
the  lower  limit  on  system  availability  ^ (p)  was  computed, 
using  equation  (3-2-13)  with  the  above  parameters  and  a particular 
value  of  p . A total  of  1000  redundant  systems  were  then  simulated, 
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and  the  fraction  whose  availability  exceed  a (p)  was  obtained. 

—s 

It  is  these  fractions  that  appear  in  the  body  of  the  table;  for 
instance,  in  the  first  case  0.503  corresponds  to  p = 0.5,  0.790 

to  p = 0.80,  and  0.959  to  p = 0.95. 

The  computer  program  utilized  to  produce  the  quoted 
results  will  also  simulate  more  complex  redundant  systems. 

3 . 4 Conclusions 

The  LALOD  procedure  for  constructing  probability  (Bayes 
prior)  limits  on  system  availability  is  computational  simple. 

Based  on  the  simulation  results  to  date,  the  method  appears  to  be 
valid.  Further  validation  experiments,  and  analytical  investigations 
of  the  method,  would  seem  to  be  indicated. 

Two  related  general  areas  for  further  investigation  are 
the  following; 

(a)  The  robustness  or  insensitivity  of  the  LALOD  method  to 
the  specific  assumption  of  the  log  normal  for  unit 
parameter  priors.  There  are  indications  that  the 
procedure  may  be  relatively  insensitive,  particularly 
when  used  to  evaluate  rather  complicated  redundant  systems, 
by  virtue  of  central  limit  theorem  effects. 

(b)  The  possibility  of  combining  the  LALOD  prior  approach 
with  data  to  form  a posterior,  in  the  strict  Bayesian 
sense.  Perhaps  better,  another  method  for  "borrowing 
strength"  from  experience  with  other  units  in  other 
locations  can  be  devised.  Also,  the  approximate 
normality  of  the  system  log  odds  may  be  exploited  to 
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SAMPLE  VALIDATIONS 


Table  3.1 

Two-Unit  Redundant  System 
(1000  repetitions  per  case) 


Cases 

p = 0.5 

p = 0.80 

P = 0.95 

ml  = 2.n  (10-3/2)  , o| 

= «,n  4 

m2  = Jin  (10_3/4 ) , a\  = 

0.503 

0.790 

0.959 

E tx,  ] = 10"3,  E [A  2 ] = 

10'3/2 

CV ( X x ] = CV ( X 2 ] = 3 

m1  = £n  (10-1//2)  , o* 

= in  2 

0.531 

0.835 

0.962 

,-3 


m2  = i n (10  J/4 ) , a22  = Hn  4 
E[XX]  = 10_1,  E (X  2]  = 10"3/ 
CV [ X x ] = 1,  CV  l A 2 ] = 3 


0.483 


= Hn(10*1//2) , a[  = in  2 
m2  = in  (10~3//2) , o|  = in  2 
E [X  . ] = 101,  E [X  - ] = 10“3 

»\,  l <\,t 

CV  [X . ] = 1,  CV(A,]  = 1 


Note:  For  simplicity  only,  E[jj^]  = 1 and 

the  above.  Also,  o!,  = a?  , 

l a i 

of  variation. 


0.805 


0.955 


1 = 0 throughout 

1 i 

and  CV(.)  stands  for  coefficient 
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PART  IV.  AVAILABILITY  ESTIMATION  BY 
USE  OF  THE  JACKKNIFE 


4.1.  General 

Consider  now  the  problem  of  estimating  the  availability 
of  a single  equipment  from  data  on  its  up  and  down  times: 
u , u2,...  un,  and  d1#  d2,...  dn#  respectively.  By  virtue  of 
equation  (2-2-7),  namely, 


E[U.] 

Ai  = E £Ui  J + ElD^l 


(4-1-1) 


one  could  estimate  A^  as  follows: 


u + d 


where  as  usual  the  bars  denote  averages: 


d 


1_ 

n 


n 

2 u. 

l 

i=l 


n 


i=l 


(4-1-2) 


(4-1-3) 


However,  because  u and  d are  only  approximations  to  the  true 
means  the  resulting  approximations  for  A can  be  quite  poor. 

In  practice  it  will  be  of  interest  to  estimate  the 
availability  of  a single  equipment,  e.g.,  a power  plant,  or  a 
redundant  combination  of  equipments,  such  as  a safety  device,  by 
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using  observed  time-to- failure  and  down  time  data.  Also,  an 
assessment  of  the  stability  of  the  estimates,  perhaps  in  the  form 
of  confidence  limits,  will  be  desirable.  Such  a program  can,  in 
principle,  be  carried  out  by  (i)  postulating  distributional  forms 
for  the  up  or  failure  times,  U,  and  down  or  repair  times,  D, 

(ii)  fitting  the  parameters  of  the  latter  distributions  according 
to  satisfactory  statistical  procedures,  such  as  maximum  likelihood 
or,  possibly,  Bayesian  techniques,  and  (iii)  substituting  the 
parameter  estimates  into  the  availability  formulas,  such  as 
equation  (4-1-2).  In  order  to  find  confidence  limits,  a linearization 
technique  that  relies  on  the  asymptotic  normality  of  maximum 
likelihood  estimates  may  be  employed. 

This  paper  presents  a procedure  alternative  to  the 
above;  it  has  been  called  the  jackknife  by  J.W.  Tukey.  For 
further  discussion  see  Mosteller  and  Tukey  [1_2]  , also  Cox  and 
Hinkley  [£] , and  Gray  and  Schucany  [1£] ; a review  has  recently 
been  furnished  by  R.G.  Miller  [1£]  . In  brief,  the  jackknife 
method  has  the  capacity  to  reduce  the  bias  of  estimates  of  such 
quantities  as  system  availability,  and  also  to  furnish  confidence 
limits  that  behave  in  a satisfactory  manner — economically  enclose 
the  true  availaiblity-despite  the  fact  that  underlying  distributions 
are  unknown.  Demonstration  of  these  properties  can  be  carried 
out  mathematically  when  sample  sizes  are  large,  but  in  realistic 
situations  the  jackknife  technique  must  be  validated  by  Monte  Carlo 
simulation.  A number  of  such  simulation  results  are  presented  in 
this  paper,  and  comparison  with  alternative  methods  are  given. 
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4.2.  A Jackknife  Procedure  for  a Single  Unit. 

Jackknifed  estimates  and  confidence  limits  are  constructed 
by  successively  leaving  out  pa  ts  of  the  available  data  to  construct 
pseudovalues.  These  are  then  averaged,  and  the  stability  of  the 
average  assessed  by  Student's -t  in  order  to  obtain  confidence 
limits.  The  procedure  is  given  as  below: 

(1)  Transform  first  (see  Mosteller  and  Tukey  [12])  estimated 
2-2-7: 


= Jin  u - Jin  cT; 


(4-2-1) 


jackknifing  will  be  carried  out  using  the 
statistic  in  u - Hn  3 = z . 

(2)  Recompute  z repeatedly,  leaving  out  successively  the 
sample  pairs  (u1,d1) , (u2,d2) , . . . (u^ ,d^) , . . . (un,dn) 


-j-1  n 1 r ^ 1 n -i 

2 . = in  £ Ui  + I uA  - in  2 d.  + £ d. 

Li=l  i= j+1  J L i=l 


j 1,2, . . *n. 

(3)  Compute  the  pseudovalues  as  follows: 


(4-2-2) 


Zj  = nz  - (n-l)z_j 


j = 1 ,2 , . . . ,n; 


(4-2-3) 
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recall  that  z 


is  the  result  of  computing 


the  quantity  to  be  jackknifed,  leaving  out  none 
of  the  data. 

(4)  Compute  the  mean  and  variance  of  the  pseudovalues: 


1 n 

z = - y z . 

n * 3 


1 n 


n _ 2 

2 (Zj  -z) 


(5)  The  jackknifed  point  estimate  of  the  availability  is 


jk  z 

1+e 


(6)  "Symmetric"  two-sided  confidence  limits  at  confidence 
level  (l-a)100%  are  derived  as  follows: 


5 + tW2(n-1: 


— = H 


5 - ‘1-0/2 (n_1>  ‘ L« 

where  ^i_a/2  is  the  (l-y)100%  quantile  of 

Student's -t  with  n-1  degrees  of  freedom.  Then 


< A < e 


l+eL° 


Hn 

1+e  a 


with  confidence  approximately  (1-“)100%.  Note  that 
the  confidence  limits  are  nearly  symmetric  around 
£n (E [U]/E (D] ) , and  not  around  A. 


(7)  One-sided  confidence  limits  at  confidence  level  (l-a)ioo% 
are  derived  as  follows 


Z + H-a'"*11 


- tl-a  ^vJSA 
N n 


(4-2-8) 


so  a one-sided  upper  confidence  limit  is 

eHa 

A < — 

- , H~ 

1+e  a 

and  a lower  confidence  limit  is 

A 1 eLa 
l+eH« 

both  at  confidence  level  (*-a)100%. 


(4-2-9) 


(4-2-10) 
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4.3.  Validation  by  Simulation. 

The  jackknife  procedure  may  be  validated,  in  an  empirical 
sense,  by  sampling  experiments  or  computer  simulation  in  the 
following  manner.  First,  an  artificial  batch  or  sample  of  data 

I ! 

is  obtained  by  drawing  random  numbers  from  postulated  distributions 

for  U,  and  for  D.  For  example,  {u^}  and  {d^}  are  independently 

sampled  from  the  exponential  distributions  with  means  y ^ = 100, 

and  X-1  = l,  respectively.  Second,  the  jackknife  point  estimate 

((4-2-2)  above)  and  confidence  limits  ((4-2-3)  above)  are  computed. 

Since  the  values  of  E[U]  and  E[D]  are  known,  so  is  the  theoretical 

value  of  A,  The  jackknife  confidence  intervals  can  be  checked 

for  coverage:  if  Lq  <_  A <_  Hq  then  the  particular  interval 

covers,  while  otherwise  (if  A < L or  H < A)  it  does  not 
a a 

cover.  Finally,  the  above  procedure  can  be  repeated  many  times 
(say  1000)  and  the  fraction  of  the  repetitions  which  contain  the 
true  value  of  A are  recorded.  This  fraction  of  the  coverage 
should  desirably  be  close  to  (1  - a) . Also,  the  average  length, 

i 

and  variance  of  length,  of  the  confidence  intervals  obtained  in 

repeated  sampling  can  be  recorded.  The  jackknife  confidence 

limits  procedure  can  be  said  to  be  robust  of  validity  if  the 

actual  coverage  is  close  to  the  nominal  coverage,  1 - a,  for  a 

wide  range  of  distributions  for  U and  D.  The  procedure  can 

be  said  to  be  robust  of  efficiency  if  the  confidence  limits  tend 

to  be  short,  i.e.,  if  there  is  evidence  that  E[H  ] - E[L  ) is 

a a 

comparable  to  the  length  of  confidence  intervals  obtained  when 
the  underlying  distributional  families  for  U and  D are  known, 
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and  the  most  efficient  procedures  for  estimation  pertinent  to 
these  families,  are  used.  Without  the  evidence  available  from  a 
very  large  data  base,  choice  of  specific  distributional  forms  for 
U and  D must  be  based  on  judgment.  The  following  example  sit- 
uations seem  to  reflect  the  types  of  distributional  behaviors 
that  may  occur. 

(A) .  U is  exponentially  distributed,  E[U]  - X-1. 

D is  exponentially  distributed,  E[D]  * y'1. 

Successive  times  to  failure  and  repair  times 
are  independent.  Note:  This  is  the  widely  seen 

Markov  model,  is  mathematically  convenient,  and 
may  well  be  reasonably  accurate  under  many 
circumstances . 

(B)  . U is  exponentially  distributed.  D is  gamma 

distributed  with  shape  parameter,  k,  greater 
than  unity:  E[D)  «=  y”1,  Var[D]  - (/Jc  y)~2. 

Note:  the  gamma  family  with  k > 1 qualitatively 

represents  data  that  is  more  tightly  grouped 
around  its  mean  than  is  true  of  exponentially 
distributed  data.  The  logarithmic-normal 
distribution  also  has  the  above  general  property, 
and  has  been  used  to  represent  repair  times; 
see  Gray  and  Schucany  f9] . 
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(C)  . U is  exponentially  distributed,  E[U]  = 

D is  gamma,  with  k integer  (>1);  U and 
the  subsequent  D positively  correlated. 

Note:  Situations  in  which  repair  times 

following  longer-than-average  times  to  failure 
are  themselves  longer-than-average  can  be 
imagined.  A class  of  models  is  discussed  in 
Gaver  [6].  The  present  simulation  is  a 
simplified  version  of  such  a structure. 

(D)  . U is  represented  by  a long-tailed  h-distribution, 

see  Gaver  and  Lavenburg  [7] , and  Rogers  and 
Tukey  [13] : 


U 


h>0 


where  X is  exponentially  distributed  with 
unit  mean.  The  distribution  of  U possesses 
exponential-like  characteristics  near  zero, 
but  exhibits  relatively  more  extremely  large 
times  to  failure  than  does  the  exponential. 

D is  exponential;  E[D]  = u 


The  above  alternatives  are  by  no  means  exhaustive,  but 
do  tend  to  represent  qualitatively  likely  alternative  data  behaviors. 

As  the  following  tabulations  indicate,  the  jackknife  appears  to 

i 
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Table  4.1 

Simulation  Experiments  Validating  Jackknife 
Single-Unit  Availability 
95%  Confidence  Limits;  Two-Sided  (t  = 2.064) 


Table  4.2 

Simulation  Experiments  Validating  Jackknife 
Single-Unit  Availability 


h. 


perform  creditably  when  data  comes  from  any  one  of  the  models 
described.  In  particular,  the  validity  of  the  jackknife  is 
notable  when  a long-tailed  (type  D)  distribution  governs  the 
times  to  failure. 

In  case  (A)  of  Tables  4.1  and  4.2,  the  ratio  ~ 

D 

is  proportional  to  the  F distribution  of  classical  statistics, 
with  degrees  of  freedom  in  numerator  (denominator)  equal  to  twice 
the  number  of  up  time  (down  time)  observations.  This  fact  allows 
exact  confidence  intervals  to  be  established  in  case  (A)  — and 
in  case  (A)  alone  — for  any  sample  size.  The  jackknife  coverage 
and  confidence  interval  width  compares  favorably  to  the  exact  "F" 
method  in  case  (A) , and  seems  correspondingly  more  valid  and 
efficient  in  the  other  cases  considered.  This  is  particularly 
true  for  the  long-tailed  distributions  of  type  (D) ; here  the  MF" 
method  considerably  undercovers. 

4.4.  Numerical  Applications 

In  order  to  illustrate  the  behavior  of  the  jackknifed 
estimation  procedure,  consider  system  time  to  failure  and  time  to 
repair  data  for  two  nuclear  plants,  as  quoted  by  Tietjens  and 
Waller  [14].  The  data  are  tabulated  in  Table  4.3. 

For  each  set  of  data,  the  Jackknife  pseudovalues  are 
obtained  by  successively  leaving  out  up  and  down  time  pairs, 
using  equation  (4-2-3) . The  two-sided  confidence  limits 
equation  (4-2-7)  are  computed. 


,1 


IJ 
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Table  4.3 


{u^}  and  {d^} 
and  Yankee 

of  Humboldt  Bay 
Reactor  [14] 

Humboldt 

Bay 

Yankee 

Nuclear 

Up  Times  (years) 

Down  Times  (years) 

Up  Times 

Down  Times 

0.523 

0.060 

0.063 

0.027 

0.175 

0.038 

0.055 

0.038 

0.537 

0.074 

0.296 

0.014 

1.019 

0.197 

0.170 

0.036 

0.121 

0.016 

0.822 

0.345 

0.827 

0.088 

0.948 

0.197 

0.271 

0.016 

0.715 

0.096 

0.499 

0.066 

0.923 

0.255 

0.940 

0.058 

0.899 

0.090 

0.466 

0.099 

0.332 

0.033 

0.742 

0.060 

0.304 

0.049 

0.189 

0.058 

0.658 

0.107 

0.422 

0.016 

0.523 

0.019 

0.389 

0.222 

0.712 

0.148 

1.000 

0.118 

0.485 

0.022 

0.003 

0.047 

0.397 

0.030 

0.855 

0.085 

0.145 

0.101 

1.077 

0.153 

0.912 

0.019 

0.244 

0.260 
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These  confidence  limits  are  compared  to  the  limits 
obtained  by  Tietjens  and  Waller  [14],  It  is  noticed  in  Table  4.4 
that  the  jackknifed  intervals  fall  within  the  F-statistic  intervals, 
and  also  within  the  simulation  intervals.  As  will  appear  from 
the  simulation  results  of  the  following  section,  the  jackknife 
procedure  gives  more  uniformly  valid  confidence  intervals  than 
does  the  F procedure  when  the  underlying  distributions  are  not 
known.  This  robustness  is  a point  in  favor  of  the  jackknife, 
from  a practical  viewpoint,  for  sampling  experiments  have  confirmed 
its  validity. 


Table  4.4 


Yankee 

Two-Sided  95%  Confidence  Limits 
on  Plant  Availability 

Lower  Limit 

Simulat.  (Tietjens-  0.710 

Upper  Limit 
0.909 

Nuclear: 

Waller) 

Jackknife 

0.762 

0.887 

(n=19) 

F 

0.729 

0.906 

Humboldt 

Simulat.  (Tietjens- 

0.779 

0.923 

Bay 

Waller) 

Jackknife 

0.829 

0.905 

(n=18) 

F 

0.778 

0.930 
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4.5.  Jackknifing  System  Availabilit 


The  topic  of  this  section  is  the  estimation  of  the 
availability  of  a system  of  several  (two  or  more)  equipments  from 
time  to  failure  and  repair  data.  Again  the  jackknife  technique 
is  emphasized.  Variations  of  this  method  are  described  and  are 
again  evaluated  by  means  of  simulation. 

4.5.1  Two  specific,  simple,  systems  will  be  considered  here 
for  illustration. 


System  Type  1.  Two  Component  Redundant. 

Two  subsystems  are  arranged  in  parallel,  so  that  in 
order  for  the  entire  system  to  fail,  both  must  be  down  simultaneously. 
If  Ai  is  the  availability  of  the  ith  (i  = 1,2)  then  the 
system  unavailability  is 


A = (1-A^) (1-A2)  = Ax  A2 


I E'D1>  \ / E<D2>  \ 

\ E [U1]+E [D1l  / ^E[U2)+E[D2) ) 


under  the  assumption  that  the  two  systems  fail  and  are  repaired 
independently.  If  there  are  K such  subsystems,  then  of  course 


K E[Di) 
A = if1E[Ui)+E[l 


System  Type  2.  Two-Out-of-Three  Voting. 


Suppose  three  subsystems  are  arranged  to  vote;  when  a 
demand  is  made  for  the  system  then  if  at  least  two  out  of  three 
subsystems  are  available,  the  system  is  itself  available.  The 
system  availability  in  terms  of  subsystem  availability  is  given 
as  below 


A 3 Aj  Aj  + A^  A2  A^  + A^  Aj  A^  + A^  A2  A^  (4-4-2) 

4.5.2  Some  Jackknife  Procedures 

If  a system  consists  of  subsystems  which  are  assumed  to 
be  identical  and  independent  then  data  on  times  to  failure  and 
times  to  repair  can  be  pooled.  The  jackknife  procedure  discussed 
in  Section  4.4  requires  only  a modest  adaptation. 

(A)  Jackknifing  System  Type  1;  Identical  Subsystems. 

Since  subsystems  behave  identically,  by  assumption 

E[D]  = E [D1J  = E[D2],  E[U]  = ElU^  = Efl^] 

- eTu?+§TdT  ,4-4-3) 


and  thus 


(4-4-4) 


This  suggests  the  following  procedure 
(1)  Transform: 


An 


An  d 


in  u = -z 


(4-4-5) 


(2)  Jackknife  z,  in  the  manner  described  in  Sec.  4.1,  pooling 
all  up  time  and  down  time  data.  The  previously  reported  sampling 
experiments  for  one  equipment  indicate  the  validity  of  the  intervals 
so  obtained;  two-sided  confidence  limits  are  of  this  form: 


. 

i 


and  other  limits  are  found  in  an  analogous  manner. 


(4-4-6) 


(B)  Jackknifing  System  Type  1;  Different  Subsystems. 

It  is  often  unrealistic  to  assume  that  redundant  sub- 


A = Ax  A2 


parameters . 

In  this  case 

1 E[DX] 

\ / EtD2>  ^ 

1 

y eTd^T+eTu]T 

/ \E[D2]+e[U2]  j 

f • 

(4-4-7) 


and  a logarithmic  transformation  is  suggested: 


An  A = An  A^  + An  A2; 


it  is  this  function  that  will  be  jackknifed.  Let  uki  denote 


, th 


the  i—  time  to  failure  of  equipment  k (k=l,2;  i=*l,2...  n^)  , 
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and  let  be  the  corresponding  down  time.  Here  are  two 

jackknife  procedures. 

Procedure  1. 

(1)  Compute  the  pseudovalues  z.  for  each  subsystem's  data  as 

J 

described  by  equation  (4-2-3) . 

(2)  Compute  the  pseudovalues 


Vi  ■ tn  Vi 


- Jln(l+e  k=l*  2,  j-l,2,...,nfc  (4-4-8) 


(3)  The  means  and  variances  of  . are  given  by 

K#  j 


I *k,j 


k j-i 


— “ T~  £k  {)lk  i"Mk)2 

"k’1  jtl  k'3  k 


computed  using  Student's  t: 


Ha  = M + tl-a/2(nl+n2'2,*4 S 


(4-4-9) 


3nd  2 f ni  n2 

(4)  Two-sided  (l-a)100%  confidence  intervals  of  4nA  are 


La  = M " tl-a/2(nl+n2'2),S 
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(5)  Translated  into  confidence  limits  on  A,  the  limits  become 

Ha(A)  = eHa,  and  La(A)  = eLa  (4-4-13) 

respectively;  these  limits  are  analogous  to  those  of  equation  (4-2-2). 

Note  1:  Procedure  1 directly  assesses  the  variability  of  the 

individual  estimates  of  A^  and  A2  in  terms  of  functions  of 
the  original  pseudovalues. 

Note  2:  The  procedure  is  essentially  equivalent  to  the  statistical 

independent-t  test  applied  to  the  jackknifed  data. 

Procedure  2. 

An  alternative  approach  is  to  compute  the  jackknife 
estimate  of  the  (un) availability  of  each  subsystem,  and  then  to 
assess  and  combine  the  variabilities  of  these  estimates. 

(1)  Compute  the  pseudovalues  zk  ^ and  the  sample  mean,  m^  , 
and  sample  variance  s^  of  each  subsystem's  data. 

(2)  Calculate  the  logs  of  the  jackknife  point  estimates, 

Mk  = *n  Vjk  = ' 2n  l1  + e 

M = Mj  + M2. 


k-l,2 


(4-4-14) 
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(3)  Compute  the  variance  of  the  jackknife  point  estimates  by 
using  the  asymptotic  "linearization"  or  "small  errors"  approach  [5], 


Var 


and  the  variance  of  point  estimate  AJR  is 


S2  = 


2 Var 
k=l 


[*n  *k,jkJ 


k*l , 2 


(4-4-15) 


(4-4-16) 


(4)  Construct  the  confidence  limits  of  system  unavailability  in 
the  same  manner  as  equations  (4-4-12)  and  (4-4-13) . 


(C)  Jackknifing  System  Type  2;  Different  Subsystems. 

The  availability  of  a two-out-of-three  voting  system 
when  components  differ  is  given  by  equation  (4-4-2).  Suppose 
that  up  and  down  time  data  are  known  for  the  components,  this 
section  describes  a jackknife  procedure  for  applying  confidence 
limits  to  the  system  availability.  The  method  given  here  relies 
upon  the  linearization  technique  used  as  the  basis  for  Procedure  2 
of  (B) . 


Procedure : 

(1)  Form  the  pseudovalues  for  the  jackknife  estimates  of 


*n(E[uk]/E(Dv]) 


j»  k=l  ,2,3;  j«*l , 2 , . . . n^ 


L 
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12)  Compute 


- i >ki 

"k  j-l  ^ ' 


(4—4—18] 


"k-1 


2 (zk,ri)2 


k=l , 2 , 3 . 


(3)  Compute  the  jackknife  point  estimate  of  system  availability 


AJK  ~ A1 , JKA2 , J KA3 , JK+A1 , JKA2 , JKA3 , JK+A1 , JKA2 , JKA3 , JK+A1 , JKA2 , JKA3 , JK 


and  its  log-odds  transform 


(4-4-19) 


•»  - te  rife 


(4-4-20) 


(4)  Compute  the  estimated  variance  of  4 JK: 


(ajkajk} 


• -2  - 2 s * 

^ A2 , JKA3 , JK+A2 , JKA3  ’K1  lAlf  JK^JK1  — + 

nl 


^ A1 , JKA3 , JK+A1 , JKA3 , JK^  lA2,JKA2, JK1 2 — * 

n2 

- - 2 _ 2 
lAl, JKA2,JK+A1,JKA2,JK1  lA3,JKA3,JKJ  ~ ! 


(4-4-21) 


the  latter  is  derived  by  linearizing  equations  (4-4-18)  and  (4-4-19) 
and  combining. 


(5)  Two-sided  (l-o)100%  confidence  limits  for  in  are 

Ha  = £JK  + fcl-a/2  (nl+n2+n3_3)  ,S«. 

(4-4-22) 

La  * £JK  ~ tl-a/2 (nl+n2+n3_3)  ,Sf  ; 
two-sided  confidence  limits  on  A are  given  by  equation  (4-2-8). 

4 . 6 Validation  by  Simulation. 

Sampling  experiments  designed  to  validate  the  procedures 
described  do  so  in  a satisfactory  manner  for  the  cases  considered. 

The  following  tables  illustrate  the  situation.  Note  that  there 
is  no  "exact"  finite-sample  procedure  analogous  to  use  of  the  "F" 
statistic  available  for  the  single-unit  situations  when  distributions 
are  assumed  to  be  exponential.  Further  sampling  experiments, 
unreported  here,  also  show  that  the  nominal  coverage  is  very 
nearly  achieved  in  all  cases. 


Table  3.1 

Simulation  Experiments  Validating  Jackknife 
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